Modelling Speech Signals using Formant Frequencies as an Intermediate Representation
نویسندگان
چکیده
This paper concerns Multiple-level Segmental HiddenMarkov Models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the formant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, while these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of non-linear formant-to-acoustic mappings.
منابع مشابه
Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کاملA new Automatic Formant Tracking approach based on scalogram maxima detection using complex wavelets
In this paper we present a new formant tracking algorithm where the formant frequencies estimation was based on local maxima detection of a time frequency representation. This representation can be shown by a scalogram issued from a complex wavelet transform. The formant frequency candidates are validated as local maxima of scalogram which correspond to wavelet ridges. Then in the proposed algo...
متن کاملHMM-based MAP Prediction o Formant Frequencies from N
This paper describes how formant frequencies of voiced and unvoiced speech can be predicted from mel-frequency cepstral coefficients (MFCC) vectors using maximum a posteriori (MAP) estimation within a hidden Markov model (HMM) framework. Gaussian mixture models (GMMs) are used to model the local joint density of MFCCs and formant frequencies. More localised prediction is achieved by modelling s...
متن کاملFormants Estimation Techniques for Speech Analysis
Measuring formant frequencies in speech signals is indispensable for the search and technically problematic. Accurate measurement of formant frequencies is important in many studies of speech perception and production. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameteriz...
متن کاملFormant enhancement based speech watermarking for tampering detection
Unauthorized tampering in speech signals has brought serious problems when verifying the originality and integrity of speech signals. Digital watermarking can effectively check if the original signals have been tampered by embedding digital data into them. This paper proposes a tampering detection scheme for speech signals based on formant enhancement-based watermarking. Watermarks are embedded...
متن کامل